I looked at several answers, and was not able to see a clear solution to what I'm trying to do.
I have an LSTM for binary text classification that takes the top 40k words in a corpus, then operates on the first 50 tokens. Prepared like this:
max_words = 40000
max_review_length = 50
embedding_vector_length = 100
batch_size = 128
epochs = 10
all_texts = combo.title.tolist()
lstm_text_tokenizer = Tokenizer(nb_words=max_words)
lstm_text_tokenizer.fit_on_texts(all_texts)
x_train = lstm_text_tokenizer.texts_to_sequences(x_train.title.tolist())
x_test = lstm_text_tokenizer.texts_to_sequences(x_test.title.tolist())
x_test = sequence.pad_sequences(x_test, maxlen=50)
x_train = sequence.pad_sequences(x_train, maxlen=50)
My current model looks like this:
def lstm_cnn_model(max_words, embedding_vector_length, max_review_length):
model = Sequential()
model.add(Embedding(max_words, embedding_vector_length, input_length=max_review_length))
model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(LSTM(100))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
return model
I also have a 1-dimensial list of meta data for each example, with one value per example. I may have more complex meta data to add in the future.
My question is, what is the best way to combine these two inputs in the training of the model?
It would be wise to now switch to the functional API and create a multi-input network which will take the text as well the meta data:
text_in = Input(shape=(max_review_length,))
meta_in = Input(shape=(1,)) # so 1 meta feature per review
# Embedding(...)(text_in)
# You process text_in however you like
text_features = LSTM(100)(embedded_text)
merged = concatenate([text_features, meta_in]) # (samples, 101)
text_class = Dense(num_classes, activation='softmax')(merged)
model = Model([text_in, meta_in], text_class)
return model
The idea is that the functional API gives you the option to create computation graphs that can use both inputs in a non-sequential way. You can extract features from text, features from meta data and merge them to see if it improves classification. You might want to look at how to encode data for using meta data.
Related
I am trying to write a pretty complicated neural network (at least for me) in keras that needs to combine both a common CNN structure and an LSTM/GRU layer.
Basically, I have a dataset of climatological maps of the Mediterranean sea, each map details the wind, pressure and other parameters. I am studying Medicanes (Mediterranean hurricanes) and my goal is to create a neural network that can classify each map with a label zero if there is no trace of such hurricanes or one if the map contains one.
In order to achieve that I need a network with two parts:
feature extractor (normal CNN).
temporal layer (LSTM/GRU).
The main cause of this is that each map is correlated with the previous one because the formation and life cycle of a Medicane can take several days to complete.
Important note: the dataset is too big to be uploaded all at once so I have to work one batch at a time.
I am working with Keras and I found it pretty challenging to adapt its standard framework to my needs so I have come up with some peculiar flow to feed my data into the network.
In particular, I found it hard to pass both my batch size and my time-step parameter to the GRU layer using a more standard alternative.
This is what I tried:
I am positively sure I have overcomplicated the task, but, as I said I am not very proficient with Keras and TensorFlow.
The main problem was that I could not find a way to import the data both in a batch (for RAM reasons) and in a sequence of 10-15 pictures (to be used as the time steps in the GRU layer).
I solved this problem by importing batches of 120 maps in order (no shuffle) and I created a way to turn these batches into the sequence of images I needed then I proceeded to re-batch the sequences and feed them to the model manually.
Data Import
batch_size=120
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
"./Figures_1/Train",
validation_split=None,
subset=None,
labels="inferred",
label_mode="binary",
color_mode="rgb",
interpolation='bilinear',
batch_size=batch_size,
image_size=(600, 600),
shuffle=False,
seed=123
)
Get a sequence of Images
Here, I break down the 120 map batches into sequences of 60 observations, and I return each sequence one at a time.
sequence_lengh=60
def sequence_x(train_dataset):
x_numpy = np.asarray(list(map(lambda x: x[0], tfds.as_numpy(train_dataset))),dtype=object)
for element in range(0,x_numpy.shape[0]):
for i in range(0, x_numpy.shape[0],sequence_lengh):
x_seq = x_numpy[element][i:i+sequence_lengh]
yield x_seq
def sequence_y(train_dataset):
y_numpy = np.asarray(list(map(lambda x: x[1], tfds.as_numpy(train_dataset))),dtype=object)
for element in range(0,y_numpy.shape[0]):
for i in range(0, y_numpy.shape[0],sequence_lengh):
y_seq = y_numpy[element][i:i+sequence_lengh]
yield y_seq
CNN Model
I build the CNN model based on a pre-trained DenseNet
from keras.layers import TimeDistributed, GRU
def build_convnet(shape=(600, 600, 3)):
inputs = keras.Input(shape = shape)
x = inputs
# preprocessing
x = keras.applications.densenet.preprocess_input(x)
#Convbase
x = convBase(x)
x = layers.Flatten()(x)
# Fine tuning
x = keras.layers.Dense(1024, activation='relu')(x)
x = layers.Dropout(0.2)(x)
x = keras.layers.Dense(512, activation='relu')(x)
x = keras.layers.GlobalMaxPool2D()
return x
GRU Model
I build the time part of the network with a GRU layer
def action_model(shape=(15, 600, 600, 3), nbout=15):
# Create our convnet with (112, 112, 3) input shape
convnet = build_convnet(shape[1:]) #[1:]
# then create our final model
model = keras.Sequential()
# add the convnet with (5, 112, 112, 3) shape
model.add(TimeDistributed(convnet, input_shape=shape))
# here, you can also use GRU or LSTM
model.add(GRU(64))
# and finally, we make a decision network
model.add(Dense(1024, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(512, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(128, activation='relu'))
model.add(Dropout(.5))
model.add(Dense(64, activation='relu'))
model.add(Dense(15, activation='softmax'))
return model
Transfer Learning
I retrain a part of the GRU
convBase = DenseNet121(include_top=False, weights=None, input_shape=(600,600,3), pooling="avg")
for layer in convBase.layers:
if 'conv5' in layer.name:
layer.trainable = True
for layer in convBase.layers:
if 'conv4' in layer.name:
layer.trainable = True
Model Compile
Model compilation ( image size= 600x600x3)
INSHAPE=(15, 600, 600, 3) # (5, 112, 112, 3)
model = action_model(INSHAPE, 1)
optimizer = keras.optimizers.Adam(0.001)
model.compile(
optimizer,
'categorical_crossentropy',
metrics='accuracy'
)
Model Fit
Here I manually batch my data. I turn an array (60, 600, 600, 3) into a (4,15,600,600) array. Meaning 4 batches each one containing a 15-map long sequence.
epochs = 10
for value in range(0, epochs):
train_x, train_y = sequence_x(train_ds), sequence_y(train_ds)
val_x, val_y = sequence_x(validation_ds), sequence_y(validation_ds)
for i in range(0,278): #
x = next(train_x, "none")
y = next(train_y, "none")
if (x!="none" or y!="none"):
if (np.any(x) and np.any(y)):
x_stack = np.stack((x[:15], x[15:30], x[30:45], x[45:]))
y_stack = np.stack((y[:15], y[15:30], y[30:45], y[45:]))
y_stack=y_stack.reshape(4,15)
model.fit(x=x_stack, y=y_stack,
validation_data=None,
batch_size=None,
shuffle=False
)
else:
continue
else:
continue
The idea is to get a model that, when presented with a sequence of images, can categorize each one of them with a 0 or a 1 if they have a Medicane or not.
The model does compile without any errors but the results it provides are horrible:
.
What am I doing incorrectly? Is there a more effective way to write all of this?
Currently I'am training my Word2Vec + CNN for Twitter sentiment analysis about COVID-19 vaccine domain. I used the pre-trained GoogleNewsVectorNegative300 word embedding. The problem is why I heavily overfit on training proses. The reason I used the pre-trained GoogleNewsVectorNegative300 because the performance much worse when I trained my own Word2Vec using own dataset. Here several processes that I have done before fitting the model:
Text Pre processing:
Lower casing
Remove hashtag, mentions, URLs, numbers, change words to numbers, non-ASCII characters, retweets "RT"
Expand contractions
Replace negations with antonyms
Remove puncutations
Remove stopwords
Lemmatization
I split my dataset into 90:10 for train:test as follows:
def split_data(X, y):
X_train, X_test, y_train, y_test = train_test_split(X,
y,
train_size=0.9,
test_size=0.1,
stratify=y,
random_state=0)
return X_train, X_test, y_train, y_test
The split data resulting in training has 2060 samples with 708 positive sentiment class, 837 negative sentiment class, and 515 sentiment neutral class
Training:
Testing:
Then, I implemented the text augmentation that is EDA (Easy Data Augmentation) on all the training data as follows:
class TextAugmentation:
def __init__(self):
self.augmenter = EDA()
def replace_synonym(self, text):
augmented_text_portion = int(len(text)*0.1)
synonym_replaced = self.augmenter.synonym_replacement(text, n=augmented_text_portion)
return synonym_replaced
def random_insert(self, text):
augmented_text_portion = int(len(text)*0.1)
random_inserted = self.augmenter.random_insertion(text, n=augmented_text_portion)
return random_inserted
def random_swap(self, text):
augmented_text_portion = int(len(text)*0.1)
random_swaped = self.augmenter.random_swap(text, n=augmented_text_portion)
return random_swaped
def random_delete(self, text):
random_deleted = self.augmenter.random_deletion(text, p=0.5)
return random_deleted
text_augmentation = TextAugmentation()
The data augmentation resulting in training has 10300 samples with 3540 positive sentiment class, 4185 negative sentiment class, and 2575 sentiment neutral class
Then, I tokenized the sequence as follows:
# Tokenize the sequence
pfizer_tokenizer = Tokenizer(oov_token='OOV')
pfizer_tokenizer.fit_on_texts(df_pfizer_train['text'].values)
X_pfizer_train_tokenized = pfizer_tokenizer.texts_to_sequences(df_pfizer_train['text'].values)
X_pfizer_test_tokenized = pfizer_tokenizer.texts_to_sequences(df_pfizer_test['text'].values)
# Pad the sequence
X_pfizer_train_padded = pad_sequences(X_pfizer_train_tokenized, maxlen=100)
X_pfizer_test_padded = pad_sequences(X_pfizer_test_tokenized, maxlen=100)
pfizer_max_length = 100
pfizer_num_words = len(pfizer_tokenizer.word_index) + 1
# Encode label
y_pfizer_train_encoded = df_pfizer_train['sentiment'].factorize()[0]
y_pfizer_test_encoded = df_pfizer_test['sentiment'].factorize()[0]
y_pfizer_train_category = to_categorical(y_pfizer_train_encoded)
y_pfizer_test_category = to_categorical(y_pfizer_test_encoded)
Resulting in 8869 unique words and 100 maximum sequence length
Finally, I fit the into my model using pre trained GoogleNewsVectorNegative300 word embedding and CNN, and I split my training data again with 10% for validation as follows:
# Build single CNN model
def build_cnn_model(embedding_matrix, max_sequence_length):
# Input layer
input_layer = Input(shape=(max_sequence_length,))
# Word embedding layer
embedding_layer = Embedding(input_dim=embedding_matrix.shape[0],
output_dim=embedding_matrix.shape[1],
weights=[embedding_matrix],
input_length=max_sequence_length,
trainable=True)(input_layer)
# CNN model layer
cnn_layer = Conv1D(filters=256,
kernel_size=2,
strides=1,
padding='valid',
activation='relu')(embedding_layer)
cnn_layer = MaxPooling1D(pool_size=2)(cnn_layer)
cnn_layer = Dropout(rate=0.5)(cnn_layer)
batch_norm_layer = BatchNormalization()(cnn_layer)
cnn_layer = Conv1D(filters=256,
kernel_size=2,
strides=1,
padding='valid',
activation='relu')(batch_norm_layer)
cnn_layer = MaxPooling1D(pool_size=2)(cnn_layer)
cnn_layer = Dropout(rate=0.5)(cnn_layer)
batch_norm_layer = BatchNormalization()(cnn_layer)
cnn_layer = Conv1D(filters=256,
kernel_size=2,
strides=1,
padding='valid',
activation='relu')(batch_norm_layer)
cnn_layer = MaxPooling1D(pool_size=2)(cnn_layer)
cnn_layer = Dropout(rate=0.5)(cnn_layer)
batch_norm_layer = BatchNormalization()(cnn_layer)
flatten = Flatten()(batch_norm_layer)
# Dense model layer
dense_layer = Dense(units=10, activation='relu')(flatten)
batch_norm_layer = BatchNormalization()(dense_layer)
output_layer = Dense(units=3, activation='softmax')(batch_norm_layer)
cnn_model = Model(inputs=input_layer, outputs=output_layer)
return cnn_model
return lstm_model
sinovac_cnn_history = sinovac_cnn_model.fit(x=X_sinovac_train,
y=y_sinovac_train,
batch_size=128,
epochs=100,
validation_split=0.1,
verbose=1)
The training result:
I really need some suggestions or insights because I have been doing this without any performance progress to my model
That's quite a complex problem. It sure looks like overfitting as you said yourself. Meaning the model can't generalize well from your training set to new data.
Intuitively, I would suggest for you to cycle hyperparameters (epochs, batch size, learning rate, dropout layers), if you didn't already, to seek a better combination. Also, I would suggest to use cross-validation to get a better idea of the performance of your classifier. This would also shuffle the training data and avoid that the model learns the data by heart.
Have you tried classifying the original data without the data augmentation? It's not a lot of data, but it could be enough to see if the performance on the test set is better than the final version, and thus see whether the data augmentation might be screwing something up in your data.
Did you try another embedding? I don't really think this is the problem, but in the search for the error I would probably switch it to see what happens.
Last but not least, do you know for a fact that this model structure can handle this task? Meaning did you find a working example somewhere? It sure sounds like it could do it, but there is the chance that the CNN model for example just doesn't generalize well over the embeddings. Have you considered using a model specified on text classification, like a Transformer or an LSTM?
I have an LSTM architecture ready:
input1 = Input(shape=(1500, 3))
lstm = LSTM(units=100, return_sequences=False, activation='relu')(input1)
outputs = Dense(150, activation="sigmoid")(lstm)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss="binary_crossentropy", optimizer="adam",
metrics=["accuracy"])
The LSTM layer supports a calling argument called mask.
The way I'm reading the data is by using two generators, one iterates through training files and the other through the validation files (so on the .fit method I pass the training and validation generators).
model.fit(
x=training_generator,
epochs=10,
steps_per_epoch=5, # there are 5 files
validation_data=validation_generator,
validation_steps=5, # there are 5 files
verbose=1
)
Therefore each file will have a given mask (one for the training file, another for the validation file). Therefore my question is, how can I specify which mask to use?
The way I found to work was to transform the data during the preprocessing stage. If you replace the values in your data, according to the mask, with an number you know is not in your data, for instance 0 or -999, you can then add another layer to the architecture called Masking. This layer has a parameter called mask_value which will be the same number you used to transform your data:
input1 = Input(shape=(n_timesteps, n_channels))
masking = Masking(mask_value=-999)(input1)
lstm1 = LSTM(units=100, return_sequences=False,
activation="tanh")(masking)
outputs = Dense(n_timesteps, activation="sigmoid")(lstm1)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss=keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
This way you can then pass this as the input to the LSTM (since LSTMs allow this, some other types of layers do not).
I want to built a model with many smaller model's output merged as one. I want 146 network taking 17 input each and giving a probability as output. The output of all these network need to be merged and used as single unit .For which I did something like this:
def build(layer_str,actv):
#take the input layer structure and convert it into a list
layers=layer_str.split("-")
#print(layers)
#convert the strings in the list to integer
layers=list(map(int,layers))
#let's build our model
model= tf.keras.Sequential()
#we add the first layer and the input layer to our network
model.add(Dense(layers[1],input_shape=(layers[0],),activation=actv[0]))
#we add the hidden layers
for (x,i) in enumerate(layers):
if(x>1 and x!=(len(layers)-1)):
model.add(Dense(i,activation=actv[x]))
#then add the final layer
model.add(Dense(layers[-1],activation=actv[-1]))
#return the construtcted model
return model
Then, Merged models like this:
def Merge_model(layer,act,data,label,lr,epochs,batch_size):
model_list=[]
for i in range(146):
model=nn.build(layer,act)
model_list.append(model)
merged_layers = concatenate([model_list[i].output for i in range(146)])
x = merged_layers
out = Activation('sigmoid')(x)
merged_model = Model([model_list[i].input for i in range(146)], [out])
print(merged_model.summary())
merged_model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])
result,predictions=nn.train_eval(data,label,merged_model,lr,epochs,batch_size)
data=np.random.rand(10,146,17)
data=[d for d in data]
label=np.random.randint(0,1,(10,146,1))
label=[lb for lb in label]
print(len(label[0]))
lr=0.01
epochs=100
batch_size=16
Merge_model("17-7-1",["relu","sigmoid"],data,label,lr,epochs,batch_size)
I get the model summary as such But do not understand what to make of it. What is supposed to be my trainig data and layer's shape?
https://drive.google.com/file/d/1juffdLY0i9f9rgldKfHG_MYXCK8wBV09/view?usp=sharing
I have two arrays "train_vol" and "train_time" with shape (164,6790) and one array "train_cap" with shape(164,1). I want to train the model like this... inputs--> train_vol and train_time output--> train_cap .....validation inputs --> val_vol,val_time and validation output --> val_cap....shape of val_vol,val_time is (42,6790) and val_cap is (42,1) 1]1
I am trying to use model.fit() to train the model.I tried giving 2 arrays as input to variable x** and 1 array to output variable y. But I am getting an error as shown.2]2
The document says I can give list of arrays as input. So i have tried but i am getting the following error as shown in the picture. Can Anyone let me know where I have done a mistake?3]3
You can create a model which takes multiple inputs as well as multiple output using functional API.
def create_model3():
input1 = tf.keras.Input(shape=(13,), name = 'I1')
input2 = tf.keras.Input(shape=(6,), name = 'I2')
hidden1 = tf.keras.layers.Dense(units = 4, activation='relu')(input1)
hidden2 = tf.keras.layers.Dense(units = 4, activation='relu')(input2)
merge = tf.keras.layers.concatenate([hidden1, hidden2])
hidden3 = tf.keras.layers.Dense(units = 3, activation='relu')(merge)
output1 = tf.keras.layers.Dense(units = 2, activation='softmax', name ='O1')(hidden3)
output2 = tf.keras.layers.Dense(units = 2, activation='softmax', name = 'O2')(hidden3)
model = tf.keras.models.Model(inputs = [input1,input2], outputs = [output1,output2])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
You can specify the connection between your layers using the syntax above. Your model can have more than 2 inputs. The model constructed using above code looks like this.
Note that 13 and 6 in the Input layers represent features in your respective data.
For training the model you can use the following syntax:
history = model.fit(
x = {'I1':train_data, 'I2':new_train_data},
y = {'O1':train_labels, 'O2': new_train_labels},
batch_size = 32,
epochs = 10,
verbose = 1,
callbacks = None,
validation_data = [(val_data,new_val_data),(val_labels, new_val_labels)]
)
Here train_data and new_train_data are two separate data entities.
Note: You can also pass List instead of dictionary but the dictionary is more readable in terms of coding.
For more information on Functional API. You can check this link: Functional API
Try passing the arguments as
model.fit(x=(train_vol, train_time), ..)
Use () instead of []. The reason behind is that model can't decide if you are trying to give a dataset of 2 samples or two concurrent datasets. The first case means two datasets, the latter means a dataset with two samples.