I have a dataset of 50000 items: reviews & sentiment (positive or negative)
I distributed 90% to the training set and the rest to the testing set.
My question is, if I run 5 epochs on the training set that I have, shouldn't each epoch load 9000 instead of 1407?
# to divide train & test sets
test_sample_size = int(0.1*len(preprocessed_reviews)) # 10% of data as the validation set
# for sentiment
sentiment = [1 if x=='positive' else 0 for x in sentiment]
# separate data to train & test sets
X_test, X_train = (np.array(preprocessed_reviews[:test_sample_size]),
np.array(preprocessed_reviews[test_sample_size:])
)
y_test, y_train = (np.array(sentiment[:test_sample_size]),
np.array(sentiment[test_sample_size:])
)
tokenizer = Tokenizer(oov_token='<OOV>') # for the unknown words
tokenizer.fit_on_texts(X_train)
vocab_count = len(tokenizer.word_index) + 1 # +1 is for padding
training_sequences = tokenizer.texts_to_sequences(X_train) # tokenizer.word_index to see indexes
training_padded = pad_sequences(training_sequences, padding='post') # pad sequences with 0s
training_normal = preprocessing.normalize(training_padded) # normalize data
testing_sequences = tokenizer.texts_to_sequences(X_test)
testing_padded = pad_sequences(testing_sequences, padding='post')
testing_normal = preprocessing.normalize(testing_padded)
input_length = len(training_normal[0]) # length of all sequences
# build a model
model = keras.models.Sequential()
model.add(keras.layers.Embedding(input_dim=vocab_count, output_dim=2,input_length=input_length))
model.add(keras.layers.GlobalAveragePooling1D())
model.add(keras.layers.Dense(63, activation='relu')) # hidden layer
model.add(keras.layers.Dense(16, activation='relu')) # hidden layer
model.add(keras.layers.Dense(1, activation='sigmoid')) # output layer
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.fit(training_normal, y_train, epochs=5)
Output:
Epoch 1/5
1407/1407 [==============================] - 9s 7ms/step - loss: 0.6932 - accuracy: 0.4992
Epoch 2/5
1407/1407 [==============================] - 9s 6ms/step - loss: 0.6932 - accuracy: 0.5030
Epoch 3/5
1407/1407 [==============================] - 9s 6ms/step - loss: 0.6932 - accuracy: 0.4987
Epoch 4/5
1407/1407 [==============================] - 9s 6ms/step - loss: 0.6932 - accuracy: 0.5024
Epoch 5/5
1407/1407 [==============================] - 9s 6ms/step - loss: 0.6932 - accuracy: 0.5020
Sorry I'm quite new to tensorflow, I hope someone could help out!
So if you have around 50,000 datapoints, distributed with 90/10 ratio (train/test), that means that ~45,000 will be the training data, and remaining 5000 will be for testing.
When you call a fit method, Keras has the default parameter for batch_size set to 32(you can always change that to 64, 128..)
So the number 1407 tells you that the model needs to do 1407 feedforward and backpropagation steps, before one full epoch is completed(because 1407 * 32 ~ 45,000).
Related
I'm trying to create a small transformer model with Keras to model stock prices, based off of this tutorial from the Keras docs. The problem is, my test loss is massive and barely changes between epochs, unsurprisingly resulting in severe underfitting, with my outputs all the same arbitrary value.
My code is below:
def transformer_encoder_block(inputs, head_size, num_heads, filters, dropout=0):
# Normalization and Attention
x = layers.LayerNormalization(epsilon=1e-6)(inputs)
x = layers.MultiHeadAttention(
key_dim=head_size, num_heads=num_heads, dropout=dropout
)(x, x)
x = layers.Dropout(dropout)(x)
res = x + inputs
# Feed Forward Part
x = layers.LayerNormalization(epsilon=1e-6)(res)
x = layers.Conv1D(filters=filters, kernel_size=1, activation="relu")(x)
x = layers.Dropout(dropout)(x)
x = layers.Conv1D(filters=inputs.shape[-1], kernel_size=1)(x)
return x + res
data = ...
input = np.array(
keras.preprocessing.sequence.pad_sequences(data["input"], padding="pre", dtype="float32"))
output = np.array(
keras.preprocessing.sequence.pad_sequences(data["output"], padding="pre", dtype="float32"))
# Input shape: (723, 36, 22)
# Output shape: (723, 36, 1)
# Train data
train_features = input[100:]
train_labels = output[100:]
train_labels = tf.keras.utils.to_categorical(train_labels, num_classes=3)
# Test data
test_features = input[:100]
test_labels = output[:100]
test_labels = tf.keras.utils.to_categorical(test_labels, num_classes=3)
inputs = keras.Input(shape=(None,22), dtype="float32", name="inputs")
# Ignore padding in inputs
x = layers.Masking(mask_value=0)(inputs)
x = transformer_encoder_block(x, head_size=64, num_heads=16, filters=3, dropout=0.2)
# Multiclass = Softmax (decrease, no change, increase)
outputs = layers.TimeDistributed(layers.Dense(3, activation="softmax", name="outputs"))(x)
# Create model
model = keras.Model(inputs=inputs, outputs=outputs)
# Compile model
model.compile(loss="categorical_crossentropy", optimizer=(tf.keras.optimizers.Adam(learning_rate=0.005)), metrics=['accuracy'])
# Train model
history = model.fit(train_features, train_labels, epochs=10, batch_size=32)
# Evaluate on the test data
test_loss = model.evaluate(test_features, test_labels, verbose=0)
print("Test loss:", test_loss)
out = model.predict(test_features)
After padding, input is of shape (723, 36, 22), and output is of shape (723, 36, 1) (before converting output to one hop, after which there are 3 output classes).
Here's an example output for ten epochs (trust me, more than ten doesn't make it better):
Epoch 1/10
20/20 [==============================] - 2s 62ms/step - loss: 10.7436 - accuracy: 0.3335
Epoch 2/10
20/20 [==============================] - 1s 62ms/step - loss: 10.7083 - accuracy: 0.3354
Epoch 3/10
20/20 [==============================] - 1s 60ms/step - loss: 10.6555 - accuracy: 0.3392
Epoch 4/10
20/20 [==============================] - 1s 62ms/step - loss: 10.7846 - accuracy: 0.3306
Epoch 5/10
20/20 [==============================] - 1s 60ms/step - loss: 10.7600 - accuracy: 0.3322
Epoch 6/10
20/20 [==============================] - 1s 59ms/step - loss: 10.7074 - accuracy: 0.3358
Epoch 7/10
20/20 [==============================] - 1s 59ms/step - loss: 10.6569 - accuracy: 0.3385
Epoch 8/10
20/20 [==============================] - 1s 60ms/step - loss: 10.7767 - accuracy: 0.3314
Epoch 9/10
20/20 [==============================] - 1s 61ms/step - loss: 10.7346 - accuracy: 0.3341
Epoch 10/10
20/20 [==============================] - 1s 62ms/step - loss: 10.7093 - accuracy: 0.3354
Test loss: [10.073813438415527, 0.375]
4/4 [==============================] - 0s 22ms/step
Using the same data on a simple LSTM model with the same shape yielded a desirable prediction with a constantly decreasing loss.
Tweaking the learning rate appears to have no effect, nor does stacking more transformer_encoder_block()s.
If anyone has any suggestions for how I can solve this, please let me know.
I have been given 10000 images of shape (100,100), representing detection of particles, I have then created 10000 empty images of shape (100,100) and mixed them together. I have given each respective type labels of 0 and 1, seen in the code here:
Labels = np.append(np.ones(10000),np.zeros(empty_sheets.shape[0]))
images_scale1 = np.zeros(s) #scaling each image so that it has a maximum number of 1
#scaling each image so that it has a maximum number of 1
l = s[0]
for i in range(l):
images_scale1[i] = images[i]/np.amax(images[i])
empty_sheets_noise1 = add_noise(empty_sheets,0)
scale1noise1 = np.concatenate((images_scale1,empty_sheets_noise1),axis=0)
y11 = Labels
scale1noise1s, y11s = shuffle(scale1noise1, y11)
scale1noise1s_train, scale1noise1s_test, y11s_train, y11s_test = train_test_split(
scale1noise1s, y11, test_size=0.25)
#reshaping image arrays so that they can be passed through CNN
scale1noise1s_train = scale1noise1s_train.reshape(scale1noise1s_train.shape[0],100,100,1)
scale1noise1s_test = scale1noise1s_test.reshape(scale1noise1s_test.shape[0],100,100,1)
y11s_train = y11s_train.reshape(y11s_train.shape[0],1)
y11s_test = y11s_test.reshape(y11s_test.shape[0],1)
Then to set up my model I create a new function:
def create_model():
#initiates new model
model = keras.models.Sequential()
model.add(keras.layers.Conv2D(64, (3,3),activation='relu',input_shape=(100,100,1)))
model.add(keras.layers.MaxPooling2D((2, 2)))
model.add(keras.layers.Dropout(0.2))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(32))
model.add(keras.layers.Dense(64))
model.add(keras.layers.Dense(1,activation='sigmoid'))
return model
estimators1m1 = create_model()
estimators1m1.compile(optimizer='adam', metrics=['accuracy', tf.keras.metrics.Precision(),
tf.keras.metrics.Recall()], loss='binary_crossentropy')
history = estimators1m1.fit(scale1noise1s_train, y11s_train, epochs=3,
validation_data=(scale1noise1s_test, y11s_test))
which produces the following:
Epoch 1/3 469/469 [==============================] - 62s 131ms/step -
loss: 0.6939 - accuracy: 0.4917 - precision_2: 0.4905 - recall_2:
0.4456 - val_loss: 0.6933 - val_accuracy: 0.5012 - val_precision_2: 0.5012 - val_recall_2: 1.0000 Epoch 2/3 469/469 [==============================] - 63s 134ms/step - loss: 0.6889 -
accuracy: 0.5227 - precision_2: 0.5209 - recall_2: 0.5564 - val_loss:
0.6976 - val_accuracy: 0.4994 - val_precision_2: 0.5014 - val_recall_2: 0.2191 Epoch 3/3 469/469
[==============================] - 59s 127ms/step - loss: 0.6527 -
accuracy: 0.5783 - precision_2: 0.5764 - recall_2: 0.5887 - val_loss:
0.7298 - val_accuracy: 0.5000 - val_precision_2: 0.5028 - val_recall_2: 0.2131
I have tried more epochs and I still only manage to get 50% accuracy which is useless as its just predicting the same things constantly.
There can be many reasons why your model is not working. One that seems more likely is that the model is under-fitting as both accuracy on training set and validation set is low meaning that the neural network is unable to capture the pattern in the data. Hence you should consider building little more complex model by adding more layer at the same time avoiding over-fitting with techniques like dropout. You should also get best parameters by doing hyperparameter tuning.
I'm trying to train a simple model for the Yelp binary classification task.
Load BERT encoder:
gs_folder_bert = "gs://cloud-tpu-checkpoints/bert/keras_bert/uncased_L-12_H-768_A-12"
bert_config_file = os.path.join(gs_folder_bert, "bert_config.json")
config_dict = json.loads(tf.io.gfile.GFile(bert_config_file).read())
bert_config = bert.configs.BertConfig.from_dict(config_dict)
_, bert_encoder = bert.bert_models.classifier_model(
bert_config, num_labels=2)
checkpoint = tf.train.Checkpoint(model=bert_encoder)
checkpoint.restore(
os.path.join(gs_folder_bert, 'bert_model.ckpt')).assert_consumed()
Load data:
data, info = tfds.load('yelp_polarity_reviews', with_info=True, batch_size=-1, as_supervised=True)
train_x_orig, train_y_orig = tfds.as_numpy(data['train'])
train_x = encode_examples(train_x_orig)
train_y = train_y_orig
Use BERT to embed the data:
encoder_output = bert_encoder.predict(train_x)
Setup the model:
inputs = keras.Input(shape=(768,))
x = keras.layers.Dense(64, activation='relu')(inputs)
x = keras.layers.Dense(8, activation='relu')(x)
outputs = keras.layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs=inputs, outputs=outputs)
sgd = SGD(lr=0.0001)
model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
Train:
model.fit(encoder_output[0], train_y, batch_size=64, epochs=3)
# encoder_output[0].shape === (10000, 1, 768)
# y_train.shape === (100000,)
Training results:
Epoch 1/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6921 - accuracy: 0.5455
Epoch 2/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6918 - accuracy: 0.5455
Epoch 3/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6915 - accuracy: 0.5412
Epoch 4/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6913 - accuracy: 0.5407
Epoch 5/5
157/157 [==============================] - 1s 5ms/step - loss: 0.6911 - accuracy: 0.5358
I tried different learning rates, but the main issue seems that training takes 1 second and the accuracy stays at ~0.5. Am I not setting the inputs/model correctly?
Your BERT model is not training. It has to be placed before dense layers and train as part of the model. the input layer has to take not BERT vectors, but the sequence of tokens cropped to max_length and padded. Here is the example code: https://keras.io/examples/nlp/text_extraction_with_bert/, see the beginning of create_model function.
Alternatively, you can use Trainer from transformers.
I am trying to build an image classifier that differentiates images into pumps, Turbines, and PCB classes. I am using transfer learning from Inception V3.
Below is my code to initialize InceptionV3
import os
from tensorflow.keras import layers
from tensorflow.keras import Model
!wget --no-check-certificate \
https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \
-O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
from tensorflow.keras.applications.inception_v3 import InceptionV3
local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'
pre_trained_model = InceptionV3(input_shape = (150, 150, 3),
include_top = False,
weights = None)
pre_trained_model.load_weights(local_weights_file)
for layer in pre_trained_model.layers:
layer.trainable = False
# pre_trained_model.summary()
last_layer = pre_trained_model.get_layer('mixed7')
print('last layer output shape: ', last_layer.output_shape)
last_output = last_layer.output
Next I connect my DNN to the pre-trained model:
from tensorflow.keras.optimizers import RMSprop
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
x = layers.Dense (3, activation='softmax')(x)
model = Model( pre_trained_model.input, x)
model.compile(optimizer = RMSprop(lr=0.0001),
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
I feed in my images using ImageDataGenerator and train the model as below:
history = model.fit(
train_generator,
validation_data = validation_generator,
steps_per_epoch = 100,
epochs = 20,
validation_steps = 50,
verbose = 2)
However, the validation accuracy is not printed/generated after the first epoch:
Epoch 1/20
/usr/local/lib/python3.6/dist-packages/PIL/TiffImagePlugin.py:788: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
/usr/local/lib/python3.6/dist-packages/PIL/Image.py:932: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
"Palette images with Transparency expressed in bytes should be "
WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 50 batches). You may need to use the repeat() function when building your dataset.
100/100 - 43s - loss: 0.1186 - accuracy: 0.9620 - val_loss: 11.7513 - val_accuracy: 0.3267
Epoch 2/20
100/100 - 41s - loss: 0.1299 - accuracy: 0.9630
Epoch 3/20
100/100 - 39s - loss: 0.0688 - accuracy: 0.9840
Epoch 4/20
100/100 - 39s - loss: 0.0826 - accuracy: 0.9785
Epoch 5/20
100/100 - 39s - loss: 0.0909 - accuracy: 0.9810
Epoch 6/20
100/100 - 39s - loss: 0.0523 - accuracy: 0.9845
Epoch 7/20
100/100 - 38s - loss: 0.0976 - accuracy: 0.9835
Epoch 8/20
100/100 - 39s - loss: 0.0802 - accuracy: 0.9795
Epoch 9/20
100/100 - 39s - loss: 0.0612 - accuracy: 0.9860
Epoch 10/20
100/100 - 40s - loss: 0.0729 - accuracy: 0.9825
Epoch 11/20
100/100 - 39s - loss: 0.0601 - accuracy: 0.9870
Epoch 12/20
100/100 - 39s - loss: 0.0976 - accuracy: 0.9840
Epoch 13/20
100/100 - 39s - loss: 0.0591 - accuracy: 0.9815
Epoch 14/20
I am not understanding as to what is stopping the validation accuracy from being printed/generated. I get an error if the plot a graph on accuracy vs validation accuracy with a message as:
ValueError: x and y must have same first dimension, but have shapes (20,) and (1,)
what am I missing here?
It worked finally, posting my changes here in case if anybody faces issues like these.
So I changed the "weights" parameter in InceptionV3 from None to 'imagenet' and calculated my steps per epoch and validations steps as follows:
steps_per_epoch = np.ceil(no_of_training_images/batch_size)
validation_steps = np.ceil(no_of validation_images/batch_size)
As you see WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least ``steps_per_epoch * epochs`` batches (in this case, 50 batches). You may need to use the repeat() function when building your dataset.
To make sure that you have "at least steps_per_epoch * epochs batches", set the steps_per_epoch to:
steps_per_epoch = X_train.shape[0]//batch_size
I have an unbalanced dataset (only 0.06% of data is labeled 1 and the rest are labeled 0). As I researched, I had to resample the data, so I used imblearn package to randomUnserSample my dataset. Then I used a Keras Sequential to create a neural network. While training, F1Score increases to around 75% (1000th epoch result is: loss: 0.5691 - acc: 0.7543 - f1_m: 0.7525 - precision_m: 0.7582 - recall_m: 0.7472), but on test set, the result is disappointing (loss: 55.35181%, acc: 79.25248%, f1_m: 0.39789%, precision_m: 0.23259%, recall_m: 1.54982%).
What I assume is that on train set, because the number of 1s and 0s are the same and therefore class_wights are both set to 1, so the network is not costing much for wrong predictions of 1s.
I have used some techniques like reducing the number of layers, reducing number of neurons, using regularization and dropouts, but the test set f1Score is never more than 0.5%. What should I do. Thanks
my neural network:
def neural_network(X, y, epochs_count=3, handle_overfit=False):
# create model
model = Sequential()
model.add(Dense(12, input_dim=len(X_test.columns), activation='relu'))
if (handle_overfit):
model.add(Dropout(rate = 0.5))
model.add(Dense(8, activation='relu', kernel_regularizer=regularizers.l1(0.1)))
if (handle_overfit):
model.add(Dropout(rate = 0.1))
model.add(Dense(1, activation='sigmoid'))
# compile the model
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['acc', f1_m, precision_m, recall_m])
# change weights of the classes '0' and '1' and set weights automatically
class_weights = class_weight.compute_class_weight('balanced', [0, 1], y)
print("---------------------- \n chosen class_wieghts are: ", class_weights, " \n ---------------------")
# Fit the model
model.fit(X, y, epochs=epochs_count, batch_size=512, class_weight=class_weights)
return model
defining train and test set:
vtrain_set, test_set = train_test_split(data, test_size=0.35, random_state=0)
X_train = train_set[['..... some columns ....']]
y_train = train_set[['success']]
print('Initial dataset shape: ', X_train.shape)
rus = RandomUnderSampler(random_state=42)
X_undersampled, y_undersampled = rus.fit_sample(X_train, y_train)
print('undersampled dataset shape: ', X_undersampled.shape)
and result is:
Initial dataset shape: (1625843, 11)
undersampled dataset shape: (1970, 11)
and finally the neural network calling:
print (X_undersampled.shape, y_undersampled.shape)
print (X_test.shape, y_test.shape)
model = neural_network(X_undersampled, y_undersampled, 1000, handle_overfit=True)
# evaluate the model
print("\n---------------\nEvaluated on test set:")
scores = model.evaluate(X_test, y_test)
for i in range(len(model.metrics_names)):
print("%s: %.5f%%" % (model.metrics_names[i], scores[i]*100))
and result is:
(1970, 11) (1970,)
(875454, 11) (875454, 1)
----------------------
chosen class_wieghts are: [1. 1.]
---------------------
Epoch 1/1000
1970/1970 [==============================] - 4s 2ms/step - loss: 4.5034 - acc: 0.5147 - f1_m: 0.3703 - precision_m: 0.5291 - recall_m: 0.2859
.
.
.
.
Epoch 999/1000
1970/1970 [==============================] - 0s 6us/step - loss: 0.5705 - acc: 0.7538 - f1_m: 0.7471 - precision_m: 0.7668 - recall_m: 0.7296
Epoch 1000/1000
1970/1970 [==============================] - 0s 6us/step - loss: 0.5691 - acc: 0.7543 - f1_m: 0.7525 - precision_m: 0.7582 - recall_m: 0.7472
---------------
Evaluated on test set:
875454/875454 [==============================] - 49s 56us/step
loss: 55.35181%
acc: 79.25248%
f1_m: 0.39789%
precision_m: 0.23259%
recall_m: 1.54982%