I've loaded in my train and validation sets from CIFAR10 like so:
train = tfds.load('cifar10', split='train[:90%]', shuffle_files=True)
validation = tfds.load('cifar10', split='train[-10%:]', shuffle_files=True)
I've created the architecture for my CNN
model = ...
Now I'm trying to use model.fit() to train my model but I don't know how to seperate out the 'image' and 'label' from my objects. Train and Validation look like this:
print(train) # same layout as the validation set
<_OptionsDataset shapes: {id: (), image: (32, 32, 3), label: ()}, types: {id: tf.string, image: tf.uint8, label: tf.int64}>
My naive approach would be this but those OptionsDatasets are not subscript-able.
history = model.fit(train['image'], train['label'], epochs=100, batch_size=64, validation_data=(validation['image'], test['label'], verbose=0)
We can do this as follows
import tensorflow as tf
import tensorflow_datasets as tfds
def normalize(img, label):
img = tf.cast(img, tf.float32) / 255.
return (img, label)
ds = tfds.load('mnist', split='train', as_supervised=True)
ds = ds.shuffle(1024).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
ds = ds.map(normalize)
for i in ds.take(1):
print(i[0].shape, i[1].shape)
# (32, 28, 28, 1) (32,)
Use as_supervised=True for returning an image, label as tuple
Use .map() for applying preprocessing funciton or even augmentation.
Model
# declare input shape
input = tf.keras.Input(shape=(28,28,1))
# Block 1
x = tf.keras.layers.Conv2D(32, 3, strides=2, activation="relu")(input)
# Now that we apply global max pooling.
gap = tf.keras.layers.GlobalMaxPooling2D()(x)
# Finally, we add a classification layer.
output = tf.keras.layers.Dense(10, activation='softmax')(gap)
# bind all
func_model = tf.keras.Model(input, output)
Compile and Run
print('\nFunctional API')
func_model.compile(
metrics=['accuracy'],
loss= 'sparse_categorical_crossentropy', # labels are integer (not one-hot)
optimizer = tf.keras.optimizers.Adam()
)
func_model.fit(ds)
# 1875/1875 [==============================] - 15s 7ms/step - loss: 2.1782 - accuracy: 0.2280
Tensorflow knows how to handle the tfds objects. So in your case you can just do
history = model.fit(train, epochs=100, batch_size=64, validation_data=(validation, verbose=0)
No need to split out the labels from the images. But if you really want to you can do the following
labels = []
for image, label in tfds.as_numpy(train):
labels.append(label)
Related
I need help with some Machine Learning code of mine.
I am using the base of a pretrained model by google and adding the last flattening and output layers of my own.
After training the model and saving it, I commented out the training part of the code and loaded and used the saved model.
my code:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt
import tensorflow_datasets as tfds
tfds.disable_progress_bar()
class_names = ['cat','dog']
#load the images
(raw_train, raw_validation, raw_test), metadata = tfds.load(
'cats_vs_dogs',
split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
with_info=True,
as_supervised=True,
)
get_label_name = metadata.features['label'].int2str
#function to resize image
IMG_SIZE=160
def format_example(image, label):
image = tf.cast(image, tf.float32) #cast to convert integer values in pixels to float32
image = (image/127.5) - 1
image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
return image,label
#.map is to apply a function to all raw images
train = raw_train.map(format_example)
validation = raw_validation.map(format_example)
test = raw_test.map(format_example)
BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 1000
train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
validation_batches = validation.batch(BATCH_SIZE)
test_batches = test.batch(BATCH_SIZE)
IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3)
#Create base model from pretrained model MobileNetV2
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
#freezing the base (preventing any more training)
base_model.trainable = False
#Flattening it down
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
#Adding dense layer of only 1 neuron as only 2 possibilities(cat/dog)
prediction_layer = keras.layers.Dense(1)
model = tf.keras.Sequential([
base_model,
global_average_layer,
prediction_layer
])
#slow learning rate since it is a pretrained model
base_learning_rate = 0.0001
#Compiling the model
model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate),
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
initial_epochs = 3
validation_steps=20
loss0,accuracy0 = model.evaluate(validation_batches, steps = validation_steps)
print('training...')
history = model.fit(train_batches,
epochs=initial_epochs,
validation_data = validation_batches)
acc = history.history['accuracy']
print(acc)
model.save("dogs_vs_cats.h5")
model = tf.keras.models.load_model('dogs_vs_cats.h5', compile=True)
print('predicting... ')
prediction = model.predict(test, verbose=1)
print(prediction)
I am getting the following error:
ValueError: Input 0 of layer "sequential" is incompatible with the layer: expected shape=(None, 160, 160, 3), found shape=(160, 160, 3)
Do I understand correctly that you get this error only after you comment training code and run lines starting from model = tf.keras.models.load_model('dogs_vs_cats.h5', compile=True)? If yes, I think that you have a mistake in this line:
prediction = model.predict(test, verbose=1)
You use test with 3D-images, but not batched test_batches with 4D-images, this is why you obtain an error with 3D/4D shapes.
I'm trying to use Imagenet V2 with transfer-learning for multiclass classification (6 classes), but getting the following error. Can anyone please help?
ValueError: `logits` and `labels` must have the same shape, received ((None, 6) vs (None, 1)).
I borrowed this code from Andrew Ng's CNN course I took a while back but the original code was for binary classification. I tried to modify it for multiclass classification but got this error. Here's my code:
import matplotlib.pyplot as plt
import numpy as np
import os
import tensorflow as tf
import tensorflow.keras.layers as tfl
import datetime
from tensorflow.keras.preprocessing import image_dataset_from_directory
from tensorflow.keras.layers.experimental.preprocessing import RandomFlip, RandomRotation
BATCH_SIZE = 16
IMG_SIZE = (160, 160)
training_directory = "/content/drive/MyDrive/Microscopy Data/04112028_multiclass_maiden/Training/Actin"
validation_directory = "/content/drive/MyDrive/Microscopy Data/04112028_multiclass_maiden/Validation/Actin"
train_dataset = image_dataset_from_directory(training_directory,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE,
seed=42)
validation_dataset = image_dataset_from_directory(validation_directory,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE,
seed=42)
Output:
Found 600 files belonging to 6 classes.
Found 600 files belonging to 6 classes.
Code Continued...
class_names = train_dataset.class_names
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input
IMG_SHAPE = IMG_SIZE + (3,)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=True,
weights='imagenet')
def huvec_model (image_shape=IMG_SIZE, data_augmentation=data_augmenter()):
''' Define a tf.keras model for binary classification out of the MobileNetV2 model
Arguments:
image_shape -- Image width and height
data_augmentation -- data augmentation function
Returns:
Returns:
tf.keras.model
'''
input_shape = image_shape + (3,)
# Freeze the base model by making it non trainable
# base_model.trainable = None
# create the input layer (Same as the imageNetv2 input size)
# inputs = tf.keras.Input(shape=None)
# apply data augmentation to the inputs
# x = None
# data preprocessing using the same weights the model was trained on
# x = preprocess_input(None)
# set training to False to avoid keeping track of statistics in the batch norm layer
# x = base_model(None, training=None)
# Add the new Binary classification layers
# use global avg pooling to summarize the info in each channel
# x = None()(x)
#include dropout with probability of 0.2 to avoid overfitting
# x = None(None)(x)
# create a prediction layer with one neuron (as a classifier only needs one)
# prediction_layer = None
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = False
inputs = tf.keras.Input(shape=input_shape)
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tfl.Dropout(.2)(x)
prediction_layer = tf.keras.layers.Dense(units = len(class_names), activation='softmax')
# YOUR CODE ENDS HERE
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)
return model
model2 = huvec_model(IMG_SIZE)
base_model.trainable = True
# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))
# Fine-tune from this layer onwards
fine_tune_at = 120
# Freeze all the layers before the `fine_tune_at` layer
# for layer in base_model.layers[:fine_tune_at]:
# layer.trainable = None
# Define a BinaryCrossentropy loss function. Use from_logits=True
# loss_function=None
# Define an Adam optimizer with a learning rate of 0.1 * base_learning_rate
# optimizer = None
# Use accuracy as evaluation metric
# metrics=None
base_learning_rate = 0.01
# YOUR CODE STARTS HERE
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
loss_function=tf.keras.losses.BinaryCrossentropy(from_logits=True)
optimizer= tf.keras.optimizers.Adam(learning_rate=0.1*base_learning_rate)
metrics=['accuracy']
# YOUR CODE ENDS HERE
model2.compile(loss=loss_function,
optimizer = optimizer,
metrics=metrics)
initial_epochs = 5
history = model2.fit(train_dataset, validation_data=validation_dataset, epochs=initial_epochs)
Looks like you yet have to one-hot-encode your labels, i.e. instead of having number i (between 0 and 5, inclusive) for a label of an image that belongs to the i-th class, which is of shape (None, 1), provide an array of all 0's except a 1 at index i, which is of shape (None, 6). Then labels has the same shape as logits.
It is easy you need to match the logits output or you need to remove softmax or distribution at the end of the model.
Almost correct, I change a bit on un-defined data_augmentation that is working.
It will have the output but the calculation is based on output expectation try to use meanquears you see errors or use class entropy that will provide different behavior.
Somebody told it boosted up accuracy output as they are using Binary cross entropy but not this way, it will highly boosted up when using with sequences of binary see the example of ALE games ( Street Fighters )
[ Sample ]:
import os
from os.path import exists
import tensorflow as tf
import matplotlib.pyplot as plt
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
BATCH_SIZE = 16
IMG_SIZE = (160, 160)
PATH = 'F:\\datasets\\downloads\\sample\\cats_dogs\\training'
training_directory = os.path.join(PATH, 'train')
validation_directory = os.path.join(PATH, 'validation')
train_dataset = tf.keras.utils.image_dataset_from_directory(training_directory,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE,
seed=42)
validation_dataset = tf.keras.utils.image_dataset_from_directory(validation_directory,
shuffle=True,
batch_size=BATCH_SIZE,
image_size=IMG_SIZE,
seed=42)
class_names = train_dataset.class_names
print( "class_names: " + str( class_names ) )
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Functions
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
def huvec_model (image_shape=IMG_SIZE, data_augmentation = tf.keras.Sequential([ tf.keras.layers.RandomFlip('horizontal'), tf.keras.layers.RandomRotation(0.2), ])):
# def huvec_model (image_shape=IMG_SIZE, data_augmentation=data_augmenter()):
''' Define a tf.keras model for binary classification out of the MobileNetV2 model
Arguments:
image_shape -- Image width and height
data_augmentation -- data augmentation function
Returns:
Returns:
tf.keras.model
'''
input_shape = image_shape + (3,)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
base_model.trainable = False
inputs = tf.keras.Input(shape=input_shape)
x = data_augmentation(inputs)
x = preprocess_input(x)
x = base_model(x, training=False)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dropout(.2)(x)
prediction_layer = tf.keras.layers.Dense(units = len(class_names), activation='softmax')
outputs = prediction_layer(x)
model = tf.keras.Model(inputs, outputs)
return model
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
DataSet
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE)
preprocess_input = tf.keras.applications.mobilenet_v2.preprocess_input
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
IMG_SHAPE = IMG_SIZE + (3,)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=True,
weights='imagenet')
base_model.summary()
model2 = huvec_model(IMG_SIZE)
base_model.trainable = True
# Let's take a look to see how many layers are in the base model
print("Number of layers in the base model: ", len(base_model.layers))
# Fine-tune from this layer onwards
fine_tune_at = 120
base_learning_rate = 0.01
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Optimizer
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
optimizer = tf.keras.optimizers.Adam(learning_rate=0.1*base_learning_rate)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Loss Fn
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
lossfn = tf.keras.losses.BinaryCrossentropy(from_logits=False)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Summary
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model2.compile(optimizer=optimizer, loss=lossfn, metrics=[ 'accuracy' ])
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Training
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
history = model2.fit(train_dataset, validation_data=validation_dataset, epochs=5)
input('...')
[ Output ]
...
Found the bug:
I had to redefine loss in the model compiler as loss='sparse_categorical_crossentropy' which was initially defined as loss=tf.keras.losses.BinaryCrossentropy(from_logits=True).
Refer to the SO thread Changing Keras Model from Binary Classification to Multi-classification for more details.
I am working on a small project where i wanted to classify some images by training them within a convolutional network, with a pretrained VGG16 network.
These are the steps i took:
#Setting Conditions
img_size = (180,180)
batch_size = 600
num_classes = len(class_names)
#Using Keras built in preprocessing method which facilitates the preprocessing instead of having to do it manually.
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"Data/Train",
validation_split = 0.2,
subset = "training",
seed = 1337, # creating a seed so t
image_size = img_size,
batch_size = batch_size,
)
valid_ds = tf.keras.preprocessing.image_dataset_from_directory(
"Data/Train",
validation_split = 0.2,
subset = "validation",
seed = 1337, # creating a seed so t
image_size = img_size,
batch_size = batch_size,
)
BUILDING MODEL
#Creating Model
model = Sequential()
#ADDING The VGG16 Pre trained network
model.add(VGG16(pooling ='avg',weights="imagenet", include_top=False))
#adding a dense layer
model.add(Dense(num_classes,activation = 'softmax'))
#Setting the trainable parameter for VGG16 to false, as we want to use this pretrained network, and train the new images.
model.layers[0].trainable = False
#The compile() method: specifying a loss, metrics, and an optimizer To train a model with fit(), #you need to specify a loss function, an optimizer, and optionally, some metrics to monitor.
#You pass these to the model as arguments to the compile() method
model.compile(optimizer = 'adam',loss = 'categorical_crossentropy', metrics =['accuracy'])
epoch_train = len(train_ds)
opoch_val = len(valid_ds)
numbers_epochs = 2
fit_model = model.fit(train_ds, steps_per_epoch = epoch_train,verbose = 1, validation_data = valid_ds, validation_steps = opoch_val,)
When i try to fit the model, i get the following error:
ValueError: Shapes (None, 1) and (None, 43) are incompatible
If there is an expert out there to call out what i did wrong or what steps i skipped... I would be very greatful!
You need to set label_mode='categorical in image_dataset_from_directory if you plan to train your model with categorical_crossentropy. By default, the image_dataset_from_directory method assumes that the labels are encoded as integers (label_mode='int'), meaning you should being using the sparse_categorical_crossentropy loss. Here is a working example:
import tensorflow as tf
import pathlib
import matplotlib.pyplot as plt
dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz"
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)
batch_size = 32
num_classes = 5
train_ds = tf.keras.utils.image_dataset_from_directory(data_dir, shuffle=True, batch_size=batch_size, label_mode='categorical')
model = tf.keras.Sequential()
model.add(tf.keras.applications.VGG16(pooling ='avg',weights="imagenet", include_top=False))
model.add(tf.keras.layers.Dense(num_classes, activation = 'softmax'))
model.layers[0].trainable = False
model.compile(optimizer = 'adam',loss = 'categorical_crossentropy', metrics =['accuracy'])
numbers_epochs = 2
fit_model = model.fit(train_ds, epochs=numbers_epochs)
Trying to train a Resnet50 model from it's pre trained form but once it hits the code of training it, it throws this error ValueError: Shapes (None, 5) and (None, 1000) are incompatible I can't figure out what am I doing wrong here?
I have 5 classes dataset so that's why I am using categorical_crossentropy as the loss.
Full code is here:
# This is in for loop until labels.append
#load the image, pre-process it, and store it in the data list
image = cv2.imread(imagePath)
image = cv2.resize(image, (224, 224))
image = img_to_array(image)
data.append(image)
# extract the class label from the image path and update the
# labels list
label = imagePath.split(os.path.sep)[-2]
labels.append(label)
print("[INFO] ...reading the images completed", "+ Label class extracted.")
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
print("[INFO] data matrix: {:.2f}MB".format(
data.nbytes / (1024 * 1000.0)))
# binarize the labels
lb = LabelBinarizer()
labels = lb.fit_transform(labels)
# partition the data into training and testing splits using 80% of
# the data for training and the remaining 20% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.2, random_state=42)
model = ResNet50()
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print("[INFO] Model compilation completed.")
# train the network
print("[INFO] training network...")
H = model.fit(trainX, trainY, batch_size=BS,
validation_data=(testX, testY),
steps_per_epoch=len(trainX) // BS,
epochs=EPOCHS, verbose=1)
Try adding a layer with the proper number of categories for your task:
base = ResNet50(include_top=False, pooling='avg')
out = K.layers.Dense(5, activation='softmax')
model = K.Model(inputs=base.input, outputs=out(base.output))
You have used the fully connected layers of the pre-trained ResNet, you need to create your appropriate classification layers suitable for your task.
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras import Model
model = ResNet50(include_top=False)
f_flat = GlobalAveragePooling2D()(model.output)
fc = Dense(units=2048,activation="relu")(f_flat)
logit = Dense(units=5, activation="softmax")(fc)
model = Model(model.inputs,logit)
I'm relatively new to neural nets so please excuse my ignorance. I'm trying to adapt the keras BLSTM example here. The example reads in texts and classifies them as 0 or 1. I want a BLSTM that does something very much like POS tagging, though extras like lemmatizing or other advanced features are not neccessary, I just want a basic model. My data is a list of sentences and each word is given a category 1-8. I want to train a BLSTM that can use this data to predict the category for each word in an unseen sentence.
e.g. input = ['The', 'dog', 'is', 'red'] gives output = [2, 4, 3, 7]
If the keras example is not the best route, I'm open to other suggestions.
I currently have this:
'''Train a Bidirectional LSTM.'''
from __future__ import print_function
import numpy as np
from keras.preprocessing import sequence
from keras.models import Model
from keras.layers import Dense, Dropout, Embedding, LSTM, Input, merge
from prep_nn import prep_scan
np.random.seed(1337) # for reproducibility
max_features = 20000
batch_size = 16
maxlen = 18
print('Loading data...')
(X_train, y_train), (X_test, y_test) = prep_scan(nb_words=max_features,
test_split=0.2)
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')
print("Pad sequences (samples x time)")
# type issues here? float/int?
X_train = sequence.pad_sequences(X_train, value=0.)
X_test = sequence.pad_sequences(X_test, value=0.) # pad with zeros
print('X_train shape:', X_train.shape)
print('X_test shape:', X_test.shape)
# need to pad y too, because more than 1 ouput value, not classification?
y_train = sequence.pad_sequences(np.array(y_train), value=0.)
y_test = sequence.pad_sequences(np.array(y_test), value=0.)
print('y_train shape:', X_train.shape)
print('y_test shape:', X_test.shape)
# this is the placeholder tensor for the input sequences
sequence = Input(shape=(maxlen,), dtype='int32')
# this embedding layer will transform the sequences of integers
# into vectors of size 128
embedded = Embedding(max_features, 128, input_length=maxlen)(sequence)
# apply forwards LSTM
forwards = LSTM(64)(embedded)
# apply backwards LSTM
backwards = LSTM(64, go_backwards=True)(embedded)
# concatenate the outputs of the 2 LSTMs
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
after_dp = Dropout(0.5)(merged)
# number after dense has to corresponse to output matrix?
output = Dense(17, activation='sigmoid')(after_dp)
model = Model(input=sequence, output=output)
# try using different optimizers and different optimizer configs
model.compile('adam', 'categorical_crossentropy', metrics=['accuracy'])
print('Train...')
model.fit(X_train, y_train,
batch_size=batch_size,
nb_epoch=4,
validation_data=[X_test, y_test])
X_test_new = np.array([[0,0,0,0,0,0,0,0,0,12,3,55,4,34,5,45,3,9],[0,0,0,0,0,0,0,1,7,65,34,67,34,23,24,67,54,43,]])
classes = model.predict(X_test_new, batch_size=16)
print(classes)
My output is the right dimension, but is giving me floats 0-1. I think this is because it's still looking for binary classfication. Anyone know how to fix this?
SOLVED
Just make sure the labels are each binary arrays:
(X_train, y_train), (X_test, y_test), maxlen, word_ids, tags_ids = prep_model(
nb_words=nb_words, test_len=75)
W = (y_train > 0).astype('float')
print(len(X_train), 'train sequences')
print(int(len(X_train)*val_split), 'validation sequences')
print(len(X_test), 'heldout sequences')
# this is the placeholder tensor for the input sequences
sequence = Input(shape=(maxlen,), dtype='int32')
# this embedding layer will transform the sequences of integers
# into vectors of size 256
embedded = Embedding(nb_words, output_dim=hidden,
input_length=maxlen, mask_zero=True)(sequence)
# apply forwards LSTM
forwards = LSTM(output_dim=hidden, return_sequences=True)(embedded)
# apply backwards LSTM
backwards = LSTM(output_dim=hidden, return_sequences=True,
go_backwards=True)(embedded)
# concatenate the outputs of the 2 LSTMs
merged = merge([forwards, backwards], mode='concat', concat_axis=-1)
after_dp = Dropout(0.15)(merged)
# TimeDistributed for sequence
# change activation to sigmoid?
output = TimeDistributed(
Dense(output_dim=nb_classes,
activation='softmax'))(after_dp)
model = Model(input=sequence, output=output)
# try using different optimizers and different optimizer configs
# loss=binary_crossentropy, optimizer=rmsprop
model.compile(loss='categorical_crossentropy',
metrics=['accuracy'], optimizer='adam',
sample_weight_mode='temporal')
print('Train...')
model.fit(X_train, y_train,
batch_size=batch_size,
nb_epoch=epochs,
shuffle=True,
validation_split=val_split,
sample_weight=W)
Solved. The main issue was reshaping the data for the classification categories as binary arrays. Also used TimeDistributed and set return_sequences to True.
I knows that this thread is very old but i hope will i can help.
I modified the model for a binary model:
sequence = Input(shape=(X_train.shape[1],), dtype='int32')
embedded = Embedding(max_fatures,embed_dim,input_length=X_train.shape[1], mask_zero=True)(sequence)
# apply forwards LSTM
forwards = LSTM(output_dim=hidden, return_sequences=True)(embedded)
# apply backwards LSTM
backwards = LSTM(output_dim=hidden, return_sequences=True,go_backwards=True)(embedded)
# concatenate the outputs of the 2 LSTMs
merged = concatenate([forwards, backwards])
after_dp = Dropout(0.15)(merged)
# add now layer LSTM without return_sequence
lstm_normal = LSTM(hidden)(merged)
# TimeDistributed for sequence
# change activation to sigmoid?
#output = TimeDistributed(Dense(output_dim=2,activation='sigmoid'))(after_dp)
#I changed output layer TimeDistributed for a Dense, for the problem of dimensionality and output_dim = 1 (output binary)
output = Dense(output_dim=1,activation='sigmoid')(lstm_normal)
model = Model(input=sequence, output=output)
# try using different optimizers and different optimizer configs
# loss=binary_crossentropy, optimizer=rmsprop
# I changed modelo compile by to binary and remove sample_weight_mode parameter
model.compile(loss='binary_crossentropy',
metrics=['accuracy'], optimizer='adam',
)
print(model.summary())
###################################
#this is the line of training
model.fit(X_train, Y_train,
batch_size=128,
epochs=10,
shuffle=True,
validation_split=0.2,
#sample_weight=W
)
#In this moment work fine.....
Train on 536000 samples, validate on 134000 samples
Epoch 1/10
536000/536000 [==============================] - 1814s 3ms/step - loss: 0.4794 - acc: 0.7679 - val_loss: 0.4624 - val_acc: 0.7784
Epoch 2/10
536000/536000 [==============================] - 1829s 3ms/step - loss: 0.4502 - acc: 0.7857 - val_loss: 0.4551 - val_acc: 0.7837
Epoch 3/10
99584/536000 [====>.........................] - ETA: 23:10 - loss: 0.4291 - acc: 0.7980