A keras sequential model with embedding needs to be retrained starting from the currently known weights.
A Keras sequential model is trained on the provided (text) training data. The training data is tokenized by a (custom made) tokenizer. The input dimension for the first layer in the model, a embedding layer, is the number of words known by the tokenizer.
After a few days additional training data becomes available. The tokenizer needs to be refitted on this new data as it may contain additional words. That means that the input dimension of the embedding layer changes, so the previously trained model is not usable anymore.
self.model = Sequential()
self.model.add(Embedding(tokenizer.totalDistinctWords + 1,
hiddenSize + 1, batch_size=1,
input_length=int(self.config['numWords'])))
self.model.add(LSTM(hiddenSize, return_sequences=True,
stateful=True, activation='tanh', dropout = dropout))
self.model.add(LSTM(hiddenSize, return_sequences=True,
stateful=True, activation='tanh', dropout = dropout))
self.model.add(TimeDistributed(Dense(
len(self.controlSupervisionConfig.predictableOptionsAsList))))
self.model.add(Activation('softmax'))
I want to use the previously trained model as initializer for the new training session. For the new words in the tokenizer, the embedding layer should just use a random initialization. For the words already known by the tokenizer, it should use the previously trained embedding.
You can access (get and set) the layer's weights directly as numpy arrays with code like weights = model.layers[0].get_weights() and model.layers[0].set_weighs(weights, where model.layers[0] is your Embedding layer. This way you can store embeddings separately and set known embeddings by copying them from stored data.
Related
How to select number of hidden layers and number of memory cells in LSTM?
I want make LSTM model about classification.
from tensorflow.keras import Sequential
model = Sequential()
model.add(Embedding(44000,32))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
The number of memory cells can be set by passing in input_length parameter to your embedding layer, as it is defined by the length of your input sequences. This is optional and can be inferred when training data is provided.
You can increase the number of hidden LSTM layers by simply adding more. You need to set return_sequences=True to maintain the temporal dimension for your intermediate LSTM layers, however, ie,
model = Sequential()
model.add(Embedding(44000,32)) # 32-dim encoding is pretty small
model.add(LSTM(32), return_sequences=True)
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
gives two LSTM layers.
There is a pretty comprehensive guide to using RNNs for text classification in the Tensorflow documentation.
I am using an LSTM for fake news detection and added an embedding layer to my model.
It is working fine without adding any input_shape in the LSTM function, but I thought the input_shape parameter was mandatory. Could someone help me with why there is no error even without defining input_shape? Is it because the embedding layer implicitly defines the input_shape?
Following is the code:
model=Sequential()
embedding_layer = Embedding(total_words, embedding_dim, weights=[embedding_matrix], input_length=max_length)
model.add(embedding_layer)
model.add(LSTM(64,))
model.add(Dense(1,activation='sigmoid'))
opt = SGD(learning_rate=0.01,decay=1e-6)
model.compile(loss = "binary_crossentropy", optimizer = opt,metrics=['accuracy'])
model.fit(data,train['label'], epochs=30, verbose=1)
You only need to provide an input_length to the Embedding layer. Furthermore, if you use a sequential model, you do not need to provide an input layer. Avoiding an input layer essentially means that your models weights are only created when you pass real data, as you did in model.fit(*). If you wanted to see the weights of your model before providing real data, you would have to define an input layer before your Embedding layer like this:
embedding_input = tf.keras.layers.Input(shape=(max_length,))
And yes, as you mentioned, your model infers the input_shape implicitly when you provide the real data. Your LSTM layer does not need an input_shape as it is also derived based on the output of your Embedding layer. If the LSTM layer were the first layer of your model, it would be best to specify an input_shape for clarity. For example:
model = tf.keras.Sequential()
model.add(tf.keras.layers.LSTM(32, input_shape=(10, 5)))
model.add(tf.keras.layers.Dense(1))
where 10 represents the number of time steps and 5 the number of features. In your example, your input to the LSTM layer has the shape(max_length, embedding_dim). Also here, if you do not specify the input_shape, your model will infer the shape based on your input data.
For more information check out the Keras documentation.
In keras I would like to use the model with the initial layers of the structure for a given trained neuralnet with the weights I got for the training process.
Going to the case: Lets imagine we have a dataset df, after spliting into train, dev and test we train a neural network, for this example an autoencoder.
A real piece of code illustrating this concept, without providing data(i didn't consider it necessary):
from keras.models import Model
from keras.layers import Activation, Dense, Dropout, Input
# Define input layer
input_data = Input(shape=(train.shape[1],), name='Input')
# Define encoding layer
encoded = Dense(encoding_dim, activation='relu')(input_data)
# Define decoding layer
decoded = Dense(train.shape[1], activation='sigmoid')(encoded)
# Create the autoencoder model
autoencoder = Model(input_data, decoded, name='Simple AutoEncoder')
#Compile the autoencoder model
autoencoder.compile(optimizer='rmsprop',
loss='binary_crossentropy')
autoencoder.fit(train, train,
epochs=50,
batch_size=256,
shuffle=True,
validation_data=(dev_x, dev_x), verbose=0)
After compile and fit the model we have a neural network with their weights that we got from fitting process.
How could I use only the encoder part of this net by preserving the weight I got?
I believe something along this line should do the trick:
#...all the code from above, including training...
# Define the encoder model
encoder = Model(input_data, encoded, name='Encoder')
The encoder model can be treated as a fully-fledged Keras model (you can save/load/fit/evaluate/predict).
By training an Autoencoder, the encoder neuralnet part would be created with the encoded object that contains the trained weights of the autoencoder.
# Getting the trained weights of the first layer(dense layer of encoder)
weights_ae = autoencoder.layers[1].get_weights()[0]
# The previous code of the example...
# Creating the encoder model
encoder = Model(input_data, encoded, name='Encoder')
# Getting the weights of the encoder model
weights_e = encoder.layers[1].get_weights()[0]
So, finally it would be confirmed that by creating the model encoder would have the weights ("trainied experience") from the autoencoder.
I am using a RNN to build a simple classifier to classify a paragraph of words into different catalogs. it has an embedding layer, followed by RNN and then a Dense layer show as below.
It can predict correctly but besides the prediction, how can I know why the RNN got this prediction, for example, what is the weight on each words of the paragraph.
What words have made the RNN believe it belongs to a specific catalog?
model = Sequential()
embedding_size = 300
model.add(Embedding(input_dim=num_words+1, output_dim=embedding_size, input_length=max_tokens, name='layer_embedding', weights=embedding_matrix],trainable=True))
return_sequences=True))
model.add(Bidirectional(GRU(32,return_sequences=True)))
model.add(Bidirectional(GRU(32,return_sequences=True)))
model.add(Bidirectional(GRU(32)))
model.add(Dense(numdense, activation='sigmoid'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
This post on GitHub proposes a way to see the parameters' name while printing them:
for e in zip(model.layers[0].trainable_weights, model.layers[0].get_weights()):
print('Param %s:\n%s' % (e[0],e[1]))
I have a dataset with 5K rows (-1K for validation) and 17 columns, including the last one (the target integer binary label).
My model is simply this 2-layer LSTM:
model = Sequential()
model.add(Embedding(output_dim=64, input_dim=17))
model.add(LSTM(32, return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(32, return_sequences=False))
model.add(Dense(1))
model.compile(loss='binary_crossentropy', optimizer='rmsprop',
class_mode='binary')
After loading my dataset with pandas
df_train = pd.read_csv(train_file)
train_X, train_y = df_train.values[:, :-1], df_train['target'].values
and trying to run my model, I get this error:
Exception: When using TensorFlow, you should define explicitly the number of timesteps of your sequences. - If your first layer is an Embedding, make sure to pass it an "input_length" argument. Otherwise, make sure the first layer has an "input_shape" or "batch_input_shape" argument, including the time axis.
What should I put in input_length? The total rowcount?
Since my dataframe has a shape as train_X=(4000, 17) train_y=(4000,) how can I prepare it to feed this kind of model? I have to change my input data shape?
Thanks for any help!! (=
It looks like Keras uses the static unrolling approach to build recurrent networks (such as LSTMs) on TensorFlow. The input_length should be the length of the longest sequence that you want to train: so if each row of your CSV file train_file is a comma-delimited sequence of symbols, it should be the number of symbols in the longest row.